Background

This module builds on code contained in Coronavirus_Statistics_USAF_v007.Rmd. This file includes the latest code for analyzing data from USA Facts. USA Facts maintains data on cases and deaths by county for coronavirus in the US. Downloaded data are unique by county with date as a column and a separate file for each of cases, deaths, and population.

The intent of this code is to move updated functions to sourcing files and to better manage memory.

Sourcing Functions

The tidyverse library is loaded, and the functions used for CDC daily processing are sourced. Additionally, specific functions for USA Facts are also sourced:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
# Functions are available in source file
source("./Generic_Added_Utility_Functions_202105_v001.R")
source("./Coronavirus_CDC_Daily_Functions_v002.R")
source("./Coronavirus_USAF_Functions_v002.R")

Further, the mapping file specific to USA Facts is sourced:

# Updated to handle length-zero inputs - also in CDC Daily v005
# Generic function to rename columns in a file using an input vector
colRenamer <- function(df, 
                       vecRename=c(), 
                       ...
                       ) {
    
    # FUNCTION ARGUMENTS:
    # df: the data frame or tibble
    # vecRename: vector for renaming c('existing name'='new name'), can be any length from 0 to ncol(df)
    # ...: additional arguments to be passed to rename_with
    
    # Rename the columns as requested
    if(length(vecRename)>0) dplyr::rename_with(df, .fn=function(x) vecRename[x], .cols=names(vecRename), ...)
    else df
    
}

source("./Coronavirus_USAF_Default_Mappings_v002.R")

Data Updates

The latest county-level burden data are downloaded:

readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20230208.csv", 
                 "usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20230208.csv"
                 )
compareList <- list("usafCase"=readFromRDS("cty_newdata_20230108")$dfRaw$usafCase, 
                    "usafDeath"=readFromRDS("cty_newdata_20230108")$dfRaw$usafDeath
                    )

# Use existing clusters
cty_newdata_20230208 <- readRunUSAFacts(maxDate="2023-02-06", 
                                        downloadTo=lapply(readList, 
                                                          FUN=function(x) if(file.exists(x)) NA else x
                                                          ),
                                        readFrom=readList, 
                                        compareFile=compareList, 
                                        writeLog="./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log", 
                                        ovrwriteLog=TRUE,
                                        useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
                                        skipAssessmentPlots=FALSE,
                                        brewPalette="Paired"
                                        )
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20230208.csv
## Rows: 3193 Columns: 1115
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr    (3): County Name, State, StateFIPS
## dbl (1112): countyFIPS, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation ideoms with `aes()`

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 34
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## Rows: 3193 Columns: 1115
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr    (3): County Name, State, StateFIPS
## dbl (1112): countyFIPS, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 34
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230208_chk_v005.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 × 4
##   isType    cases     new_cases            n
##   <chr>     <dbl>         <dbl>        <dbl>
## 1 before 4.90e+10 97284771      3547423     
## 2 after  4.84e+10 95083869      3490762     
## 3 pctchg 1.20e- 2        0.0226       0.0160
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 × 4
##   isType  deaths   new_deaths            n
##   <chr>    <dbl>        <dbl>        <dbl>
## 1 before 6.74e+8 1082388      3547423     
## 2 after  6.46e+8 1002861      3490762     
## 3 pctchg 4.16e-2       0.0735       0.0160
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.

## NULL

# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20230208$useClusters, 
                       caption="Includes only counties with 25k+ population",
                       brewPalette="viridis"
                       )

# Save the refreshed file
saveToRDS(cty_newdata_20230208, ovrWriteError=FALSE)

Vaccines data are also updated:

cty_vaxdata_20230209 <- processCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20230209.csv", 
                                              ctyList=readFromRDS("cty_newdata_20230208"), 
                                              minDateCD=c("2022-06-09", "2022-06-09"),
                                              maxDateCD="2023-01-26"
                                              )
## Rows: 414347 Columns: 80
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): Date, FIPS, Recip_County, Recip_State, SVI_CTGY, Metro_status
## dbl (74): MMWR_week, Completeness_pct, Administered_Dose1_Recip, Administere...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## 
## Records from other than 50 states and DC:
## # A tibble: 9 × 2
##   state     n
##   <chr> <int>
## 1 AS      126
## 2 FM      127
## 3 GU      252
## 4 MH      126
## 5 MP      126
## 6 PR     9969
## 7 PW      126
## 8 VI      506
## 9 <NA>     81

## Warning: Removed 16 rows containing non-finite values (`stat_boxplot()`).

## Warning: Removed 16 rows containing non-finite values (`stat_boxplot()`).

## Warning: Removed 16 rows containing non-finite values (`stat_boxplot()`).
## 
## Count of NA records by column
##           state            FIPS popgte65_minpop popgte65_maxpop    popgte65_nnA 
##               0               0               0               0               0 
##               n 
##               0 
## 
## Records where minimum and maximum population differ# A tibble: 0 × 5
## # … with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## #   maxpop <dbl>
## 
## 
## 
## Will run with parameters:
## burdenVar: cpm dpm 
## vaxVar: vxcpoppct vxcpoppct 
## minDateCD: 2022-06-09 2022-06-09 
## maxDateCD: 2023-01-26 2023-01-26
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##        Min         1Q     Median         3Q        Max 
## -313131506   -1913551     266283    2800223  168848464 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28983.59    3272.65   8.856  < 2e-16 ***
## vaxMetric     156.60      50.59   3.096  0.00198 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11020000 on 3124 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.003058,   Adjusted R-squared:  0.002739 
## F-statistic: 9.584 on 1 and 3124 DF,  p-value: 0.001981
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##        Min         1Q     Median         3Q        Max 
## -313079816   -2168552     -27943    2554658  168198528 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                34129.38   11883.21   2.872 0.004105 ** 
## type>500k               18078.44    7005.12   2.581 0.009904 ** 
## type100k-500k           26691.52    6982.55   3.823 0.000135 ***
## type25k-100k            30691.16    7874.97   3.897 9.93e-05 ***
## vaxMetric:type<25k        139.46     239.27   0.583 0.560040    
## vaxMetric:type>500k       305.47      99.28   3.077 0.002111 ** 
## vaxMetric:type100k-500k   188.19     112.16   1.678 0.093467 .  
## vaxMetric:type25k-100k    138.08     148.62   0.929 0.352936    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11020000 on 3118 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.5674, Adjusted R-squared:  0.5662 
## F-statistic: 511.1 on 8 and 3118 DF,  p-value: < 2.2e-16
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation: weight
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 16 rows containing missing values (`geom_point()`).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -3701058   -22105     2245    34856   777649 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 486.4319    35.9334  13.537   <2e-16 ***
## vaxMetric    -5.1424     0.5554  -9.258   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 121000 on 3124 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.0267, Adjusted R-squared:  0.02639 
## F-statistic: 85.71 on 1 and 3124 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -3646748   -29301    -6242    25901   766646 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                 387.020    129.649   2.985 0.002857 ** 
## type>500k                301.573     76.428   3.946 8.13e-05 ***
## type100k-500k            259.678     76.181   3.409 0.000661 ***
## type25k-100k             420.757     85.918   4.897 1.02e-06 ***
## vaxMetric:type<25k        -1.708      2.610  -0.654 0.513066    
## vaxMetric:type>500k       -2.941      1.083  -2.715 0.006664 ** 
## vaxMetric:type100k-500k   -1.289      1.224  -1.053 0.292391    
## vaxMetric:type25k-100k    -2.779      1.622  -1.714 0.086621 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 120300 on 3118 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.1849, Adjusted R-squared:  0.1828 
## F-statistic: 88.41 on 8 and 3118 DF,  p-value: < 2.2e-16
# Save the refreshed file
saveToRDS(cty_vaxdata_20230209, ovrWriteError=FALSE)

County-level data are post-processed:

cty_postdata_20230208 <- postProcessCountyData(lstCtyBurden=cty_newdata_20230208$dfPerCapita, 
                                               lstCtyVax=cty_vaxdata_20230209$vaxFix, 
                                               lstState=readFromRDS("cdc_daily_230202")$dfPerCapita, 
                                               excludeStates="AK"
                                               )
## 
## Parameter maxDate is: 2023-02-01
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>

Additional post-processing steps are run:

# Step 1a: Burden comparisons for aggregated states
additionalCountyPostProcess(cty_postdata_20230208, p1CompareStates=c(state.abb, "DC"), p1AggData=TRUE)
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>
## Warning: Removed 6 rows containing missing values (`geom_line()`).

# Step 1: Burden aggregation for key states
# Step 2: vaccine comparisons
# Step 3: Scoring updates (and errors)
# Step 4: New rolling data (28-day default with ceilings 50000 CPM, 500 DPM)
additionalCountyPostProcess(cty_postdata_20230208, 
                            p1CompareStates=c("GA", "FL", "NE", "IL", "OR"), 
                            p2VaxStates=c("MA", "HI", "VA", "VT", "RI", "NE"), 
                            p3VaxTimes=sort(c("2022-01-01", "2023-01-25")),
                            p4DF=cty_newdata_20230208$dfPerCapita, 
                            excludeStates=c("AK")
                            )
## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 6 rows containing missing values (`geom_line()`).

## Warning: Removed 379 rows containing missing values (`geom_line()`).

Memory is cleaned:

# List of files
sapply(ls(), FUN=function(x) object.size(get(x))) %>% sort(decreasing=FALSE)
##               usafUpdatedURL                  usafMainURL 
##                          168                          232 
##             rawMakeVarMapper                     readList 
##                          592                          832 
##                       zeroNA               fullListMapper 
##                          840                          848 
##                     zeroPad2                     zeroPad5 
##                          896                          896 
##                  pivotMapper      checkControlGroupMapper 
##                         1024                         1056 
##         plotSimilarityMapper                  glimpseFile 
##                         1056                         1064 
##       checkControlVarsMapper                     fileRead 
##                         1344                         1344 
##                     uqMapper                 perCapMapper 
##                         1400                         1408 
##                    urlMapper             lstExcludeMapper 
##                         1488                         1544 
##                 combineFiles              vecSelectMapper 
##                         2072                         2160 
##                  colSelector                   glimpseLog 
##                         2184                         2688 
##                      zeroPad                 customYYYYMM 
##                         2744                         2856 
##                  vecToTibble                    genNewLog 
##                         3192                         3272 
##                sumImputedHHS                    renMapper 
##                         3368                         3488 
##             helperRollingAgg               lstComboMapper 
##                         3584                         3624 
##                    pivotData                 fileDownload 
##                         4256                         4264 
##                    skinnyHHS       createBurdenCountyDate 
##                         4272                         4480 
##          postProcessCDCDaily        processCountyVaccines 
##                         4480                         4496 
##                  readFromRDS                     printLog 
##                         4720                         4856 
##                   joinFrames                 getStateData 
##                         5040                         5272 
##                 lagCorrCheck              checkUniqueRows 
##                         5488                         5536 
##              helperPerCapita                    rowFilter 
##                         5824                         6216 
##                   colMutater                     cleanMem 
##                         6272                         6408 
##                  getClusters        checkSimilarityMapper 
##                         6832                         6912 
##               onePageCFRPlot            getCountyClusters 
##                         6928                         7280 
##                    saveToRDS              lstFilterMapper 
##                         7608                         7800 
##                  specSumProd               helperLinePlot 
##                         7896                         8152 
##                   colRenamer              clustersToFrame 
##                         8176                         8744 
##                       specNA          helperMakePerCapita 
##                         8888                         9016 
##                 checkControl               createGroupAgg 
##                         9192                         9296 
##                getCountyData                 testImputeNA 
##                         9608                         9632 
##                integrateData              createPerCapita 
##                        10304                        10512 
##               combineAggData             imputeNACapacity 
##                        10864                        11016 
##         pivotStateBurdenData        createRestatementData 
##                        11120                        11208 
##             helperSimilarity               plotSimilarity 
##                        11448                        11488 
##                    findPeaks          createVaxBurdenData 
##                        11984                        12496 
##        integrateStateVaccine            makeBurdenSummary 
##                        12768                        13736 
##                  combineRows createSummedCountyBurdenData 
##                        13776                        13968 
##             findDeltaFromMax            makeCaseHospDeath 
##                        14000                        14560 
##              checkSimilarity              clusterCounties 
##                        14736                        15072 
##               helperAggTrend               selfListMapper 
##                        17232                        17240 
##             diagnoseClusters               processRawFile 
##                        17472                        18480 
##               flagLargeDelta            plotByRestatement 
##                        18992                        19560 
##         cumulativeBurdenPlot  additionalCountyPostProcess 
##                        19728                        20304 
##        postProcessCountyData               helperAggTotal 
##                        20328                        20448 
##                tempStackPlot     hospitalCapacityCDCDaily 
##                        20736                        20792 
##           scoreVaxSimilarity       sparseCountyClusterMap 
##                        20968                        21072 
##         stateAgeVaxEvolution          repairVaxPopulation 
##                        21560                        22264 
##       downloadCountyVaccines                 createGeoMap 
##                        22528                        24880 
##                corrVaxBurden                  helperElbow 
##                        25096                        25592 
##                 keyAggMapper     downloadReadHospitalData 
##                        26624                        27312 
##            plotVaxBurdenData       plotCombineAggByMapper 
##                        27368                        28552 
##             plotDeltaFromMax           makeBurdenDatePlot 
##                        28992                        29048 
##             compareAggregate     compareStateSummedCounty 
##                        29776                        30616 
##            filterPopStateAge             hospAgePerCapita 
##                        31616                        31632 
##              scoreSimilarity                   plotCFRLag 
##                        32152                        32648 
##             helperSummaryMap                createSummary 
##                        32800                        33424 
##            readQCRawCDCDaily                findCorrAlign 
##                        33816                        35104 
##              readRunCDCDaily        cumulativeVaccinePlot 
##                        35624                        36752 
##              readRunUSAFacts      plotHospitalUtilization 
##                        38008                        38224 
##                   countyCorr                readQCRawUSAF 
##                        39064                        42776 
##              readPopStateAge            createBurdenPivot 
##                        43456                        44824 
##           peakValleyCDCDaily               makePeakValley 
##                        46208                        50344 
##                clusterStates            bucketPopStateAge 
##                        53696                        54760 
##      createDetailedSummaries        cty_postdata_20230208 
##                        77552                     25101640 
##         cty_vaxdata_20230209                  compareList 
##                     66616336                    385776176 
##         cty_newdata_20230208 
##                   1566152592
# Clean large objects
largeObjs <- c("cty_newdata_20230208", "cty_vaxdata_20230209")
cleanMem(largeObjs, delObjs=TRUE)
## 
## Memory usage prior to deleting files:
##             used   (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells   1083495   57.9    1912370  102.2   1373942   73.4
## Vcells 320673436 2446.6  638815820 4873.8 638626468 4872.4
## 
## Memory usage after deleting files:
##            used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells   971998  52.0    1912370  102.2   1373942   73.4
## Vcells 52861121 403.3  511052656 3899.1 638626468 4872.4
# List of files
sapply(ls(), FUN=function(x) object.size(get(x))) %>% sort(decreasing=FALSE)
##               usafUpdatedURL                    largeObjs 
##                          168                          224 
##                  usafMainURL             rawMakeVarMapper 
##                          232                          592 
##                     readList                       zeroNA 
##                          832                          840 
##               fullListMapper                     zeroPad2 
##                          848                          896 
##                     zeroPad5                  pivotMapper 
##                          896                         1024 
##      checkControlGroupMapper         plotSimilarityMapper 
##                         1056                         1056 
##                  glimpseFile       checkControlVarsMapper 
##                         1064                         1344 
##                     fileRead                     uqMapper 
##                         1344                         1400 
##                 perCapMapper                    urlMapper 
##                         1408                         1488 
##             lstExcludeMapper                 combineFiles 
##                         1544                         2072 
##              vecSelectMapper                  colSelector 
##                         2160                         2184 
##                   glimpseLog                      zeroPad 
##                         2688                         2744 
##                 customYYYYMM                  vecToTibble 
##                         2856                         3192 
##                    genNewLog                sumImputedHHS 
##                         3272                         3368 
##                    renMapper             helperRollingAgg 
##                         3488                         3584 
##               lstComboMapper                    pivotData 
##                         3624                         4256 
##                 fileDownload                    skinnyHHS 
##                         4264                         4272 
##       createBurdenCountyDate          postProcessCDCDaily 
##                         4480                         4480 
##        processCountyVaccines                  readFromRDS 
##                         4496                         4720 
##                     printLog                   joinFrames 
##                         4856                         5040 
##                 getStateData                 lagCorrCheck 
##                         5272                         5488 
##              checkUniqueRows              helperPerCapita 
##                         5536                         5824 
##                    rowFilter                   colMutater 
##                         6216                         6272 
##                     cleanMem                  getClusters 
##                         6408                         6832 
##        checkSimilarityMapper               onePageCFRPlot 
##                         6912                         6928 
##            getCountyClusters                    saveToRDS 
##                         7280                         7608 
##              lstFilterMapper                  specSumProd 
##                         7800                         7896 
##               helperLinePlot                   colRenamer 
##                         8152                         8176 
##              clustersToFrame                       specNA 
##                         8744                         8888 
##          helperMakePerCapita                 checkControl 
##                         9016                         9192 
##               createGroupAgg                getCountyData 
##                         9296                         9608 
##                 testImputeNA                integrateData 
##                         9632                        10304 
##              createPerCapita               combineAggData 
##                        10512                        10864 
##             imputeNACapacity         pivotStateBurdenData 
##                        11016                        11120 
##        createRestatementData             helperSimilarity 
##                        11208                        11448 
##               plotSimilarity                    findPeaks 
##                        11488                        11984 
##          createVaxBurdenData        integrateStateVaccine 
##                        12496                        12768 
##            makeBurdenSummary                  combineRows 
##                        13736                        13776 
## createSummedCountyBurdenData             findDeltaFromMax 
##                        13968                        14000 
##            makeCaseHospDeath              checkSimilarity 
##                        14560                        14736 
##              clusterCounties               helperAggTrend 
##                        15072                        17232 
##               selfListMapper             diagnoseClusters 
##                        17240                        17472 
##               processRawFile               flagLargeDelta 
##                        18480                        18992 
##            plotByRestatement         cumulativeBurdenPlot 
##                        19560                        19728 
##  additionalCountyPostProcess        postProcessCountyData 
##                        20304                        20328 
##               helperAggTotal                tempStackPlot 
##                        20448                        20736 
##     hospitalCapacityCDCDaily           scoreVaxSimilarity 
##                        20792                        20968 
##       sparseCountyClusterMap         stateAgeVaxEvolution 
##                        21072                        21560 
##          repairVaxPopulation       downloadCountyVaccines 
##                        22264                        22528 
##                 createGeoMap                corrVaxBurden 
##                        24880                        25096 
##                  helperElbow                 keyAggMapper 
##                        25592                        26624 
##     downloadReadHospitalData            plotVaxBurdenData 
##                        27312                        27368 
##       plotCombineAggByMapper             plotDeltaFromMax 
##                        28552                        28992 
##           makeBurdenDatePlot             compareAggregate 
##                        29048                        29776 
##     compareStateSummedCounty            filterPopStateAge 
##                        30616                        31616 
##             hospAgePerCapita              scoreSimilarity 
##                        31632                        32152 
##                   plotCFRLag             helperSummaryMap 
##                        32648                        32800 
##                createSummary            readQCRawCDCDaily 
##                        33424                        33816 
##                findCorrAlign              readRunCDCDaily 
##                        35104                        35624 
##        cumulativeVaccinePlot              readRunUSAFacts 
##                        36752                        38008 
##      plotHospitalUtilization                   countyCorr 
##                        38224                        39064 
##                readQCRawUSAF              readPopStateAge 
##                        42776                        43456 
##            createBurdenPivot           peakValleyCDCDaily 
##                        44824                        46208 
##               makePeakValley                clusterStates 
##                        50344                        53696 
##            bucketPopStateAge      createDetailedSummaries 
##                        54760                        77552 
##        cty_postdata_20230208                  compareList 
##                     25101640                    385776176

The latest county-level burden data are downloaded:

readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20230308.csv", 
                 "usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20230308.csv"
                 )
compareList <- list("usafCase"=readFromRDS("cty_newdata_20230208")$dfRaw$usafCase, 
                    "usafDeath"=readFromRDS("cty_newdata_20230208")$dfRaw$usafDeath
                    )

# Use existing clusters
cty_newdata_20230308 <- readRunUSAFacts(maxDate="2023-03-06", 
                                        downloadTo=lapply(readList, 
                                                          FUN=function(x) if(file.exists(x)) NA else x
                                                          ),
                                        readFrom=readList, 
                                        compareFile=compareList, 
                                        writeLog="./RInputFiles/Coronavirus/USAF_NewData_20230308_chk_v005.log", 
                                        ovrwriteLog=TRUE,
                                        useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
                                        skipAssessmentPlots=FALSE,
                                        brewPalette="Paired"
                                        )
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20230308.csv
## Rows: 3193 Columns: 1115
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr    (3): County Name, State, StateFIPS
## dbl (1112): countyFIPS, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date
## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(.cols = all_of(useVars), .fns = fn, ...)`.
## ℹ In group 1: `date = 2020-01-22`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation ideoms with `aes()`

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 0
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20230308_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230308_chk_v005.log
## Warning in left_join(., df, by = names(univData)): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
## Warning in left_join(., ref, by = names(univData)): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230308_chk_v005.log
## 
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20230308.csv
## Rows: 3193 Columns: 1115
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr    (3): County Name, State, StateFIPS
## dbl (1112): countyFIPS, 2020-01-22, 2020-01-23, 2020-01-24, 2020-01-25, 2020...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 0
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20230308_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230308_chk_v005.log
## Warning in left_join(., df, by = names(univData)): Each row in `x` is expected to match at most 1 row in `y`.
## Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 1 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 0 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20230308_chk_v005.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 × 4
##   isType    cases     new_cases            n
##   <chr>     <dbl>         <dbl>        <dbl>
## 1 before 4.90e+10 97284771      3547423     
## 2 after  4.84e+10 95083869      3490762     
## 3 pctchg 1.20e- 2        0.0226       0.0160
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 × 4
##   isType  deaths   new_deaths            n
##   <chr>    <dbl>        <dbl>        <dbl>
## 1 before 6.74e+8 1082388      3547423     
## 2 after  6.46e+8 1002861      3490762     
## 3 pctchg 4.16e-2       0.0735       0.0160
## Warning: Using `all_of()` outside of a selecting function was deprecated in tidyselect
## 1.2.0.
## ℹ See details at
##   <https://tidyselect.r-lib.org/reference/faq-selection-context.html>
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
## ℹ Please use the `linewidth` argument instead.

## NULL

# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20230308$useClusters, 
                       caption="Includes only counties with 25k+ population",
                       brewPalette="viridis"
                       )

# Save the refreshed file
saveToRDS(cty_newdata_20230308, ovrWriteError=FALSE)

Data on USA Facts have not been updated since February 5, so this process may have run its course